Goto

Collaborating Authors

 student turn


Ensembling Large Language Models to Characterize Affective Dynamics in Student-AI Tutor Dialogues

Zhang, Chenyu, Alghowinem, Sharifa, Breazeal, Cynthia

arXiv.org Artificial Intelligence

While recent studies have examined the leaning impact of large language model (LLM) in educational contexts, the affective dynamics of LLM-mediated tutoring remain insufficiently understood. This work introduces the first ensemble-LLM framework for large-scale affect sensing in tutoring dialogues, advancing the conversation on responsible pathways for integrating generative AI into education by attending to learners' evolving affective states. To achieve this, we analyzed two semesters' worth of 16,986 conversational turns exchanged between PyTutor, an LLM-powered AI tutor, and 261 undergraduate learners across three U.S. institutions. To investigate learners' emotional experiences, we generate zero-shot affect annotations from three frontier LLMs (Gemini, GPT-4o, Claude), including scalar ratings of valence, arousal, and learning-helpfulness, along with free-text emotion labels. These estimates are fused through rank-weighted intra-model pooling and plurality consensus across models to produce robust emotion profiles. Our analysis shows that during interaction with the AI tutor, students typically report mildly positive affect and moderate arousal. Yet learning is not uniformly smooth: confusion and curiosity are frequent companions to problem solving, and frustration, while less common, still surfaces in ways that can derail progress. Emotional states are short-lived--positive moments last slightly longer than neutral or negative ones, but they are fragile and easily disrupted. Encouragingly, negative emotions often resolve quickly, sometimes rebounding directly into positive states. Neutral moments frequently act as turning points, more often steering students upward than downward, suggesting opportunities for tutors to intervene at precisely these junctures.


Why I Decided to Let My Students Turn in Essays Written by a Machine

Slate

The writing sounded like the typical 3 a.m. It was the sort of paper that usually makes me wonder: Did this student even come to class? Did I communicate anything of any value to them at all? Except there were no obvious tells that this was the product of an all-nighter: no grammar errors, misspellings, or departures into the extraneous examples that seem profound to students late at night but definitely sound like the product of a bong hit in the light of day. Perhaps, just before the end of the semester, I was seeing my very first student essay written by ChatGPT?


Analyzing Prosodic Features and Student Uncertainty using Visualization

Xiong, Wenting (University of Pittsburgh) | Litman, Diane J. (University of Pittsburgh) | Marai, G. Elisabeta (University of Pittsburgh)

AAAI Conferences

It has been hypothesized that to maximize learning, intelligent tutoring systems should detect and respond to both cognitive student states, and affective and metacognitive states such as uncertainty. In intelligent tutoring research so far, student state detection is primarily based on information available from a single student-system exchange unit, or turn. However, the features used in the detection of such states may have a temporal component, spanning multiple turns, and may change throughout the tutoring process. To test this hypothesis, an interactive tool was implemented for the visual analysis of prosodic features across a corpus of student turns previously annotated for uncertainty. The tool consists of two complementary visualization modules. The first module allows researchers to visually mine the feature data for patterns per individual student dialogue, and form hypotheses about feature dependencies. The second module allows researchers to quickly test these hypotheses on groups of students through statistical visual analysis of feature dependencies. Results show that significant differences exist among feature patterns across different student groups. Further analysis suggests that feature patterns may vary with student domain knowledge.